A novel syllable duration modeling approach for Mandarin speech
نویسندگان
چکیده
In this paper, a novel syllable duration modeling approach for Mandarin speech is proposed. It explicitly takes several main affecting factors as multiplicative companding parameters and estimates all model parameters by an EM algorithm. Experimental results showed that the variance of the observed syllable duration was greatly reduced from 183.4 frame (1 frame = 5 ms) to 18.5 frame by eliminating effects from these affecting factors. Besides, the estimated companding values of these affecting factors agreed well to our prior linguistic knowledge. A preliminary study of applying the proposed model to predict syllable duration for TTS is also performed. Experimental results showed that it outperformed the conventional regressive prediction method. Lastly, an extension of the approach to incorporate initial and final duration modeling is presented. This leads to a better understanding of the relation between the companding factors of initial and final duration models and those of syllable duration model.
منابع مشابه
A new duration modeling approach for Mandarin speech
In this paper, a new duration modeling approach for Mandarin speech is proposed. It explicitly takes several major affecting factors as multiplicative companding factors (CFs) and estimates all model parameters by an EM algorithm. Besides, the three basic Tone 3 patterns (i.e., full tone, half tone and sandhi tone) are also properly considered via using three different CFs to separate their aff...
متن کاملUse of syllable center detection for improved duration modeling in Chinese Mandarin connected digits recognition
This paper describes practical approaches for improving Mandarin digit recognition accuracy, especially in cars. We consider syllable and subword unit durations as additional source of information. The explored approach was realized in two stages. First, the system performs standard speech recognition using acoustic spectral features. As a result, an n-best list of hypotheses is generated. In t...
متن کاملA New Approach of Using Temporal Information in Mandarin Speech Recognition
In this paper, a new approach of using temporal information to assist in Mandarin speech recognition is discussed. It incorporates two types of temporal information into the recognition search. One is a statistical syllable duration model which considers the influences of 411 basesyllables, 5 tones, 4 position-in-word factors, and 3 positionin-sentence factors on syllable duration. Another is t...
متن کاملAn Approach to Affective-Tone Modeling for Mandarin
Mandarin is a typical tone language in which a syllable possesses several tone types. While these tone types have rather clear manifestations in the fundamental frequency contour (F0 contour) in isolated syllables, they vary considerably in affective speech due to the influences of the speaker’s mood. In the paper the Fujisaki model based on the measured F0 contour is modified to adapt for affe...
متن کاملIssues in Text-to-Speech Conversion for Mandarin
Research on text-to-speech (TTS) conversion for Mandarin Chinese is a much younger enterprise than comparable research for English or other European languages. Nonetheless, impressive progress has been made over the last couple of decades, and Mandarin Chinese systems now exist which approach, or in some ways even surpass in quality available systems for English. This article has two goals. The...
متن کامل